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Abstract 



This paper considers a binary channel with deletions and insertions, where each input bit is transformed 

in one of the following ways: it is deleted with probability d, or an extra bit is added after it with probability 
i, or it is transmitted unmodified with probability 1 — d — i. A computable lower bound on the capacity 
of this channel is derived. The transformation of the input sequence by the channel may be viewed in 
terms of runs as follows: some runs of the input sequence get shorter/longer, some runs get deleted, and 
some new runs are added. It is difficult for the decoder to synchronize the channel output sequence to 
the transmitted codeword mainly due to deleted runs and new inserted runs. We consider a decoder that 
decodes the positions of the deleted and inserted runs in addition to the transmitted codeword. Analyzing 
the performance of such a decoder leads to a computable lower bound on the capacity. The bounds proposed 
in this paper provide the first characterization of achievable rates for channels with general insertions, and 
for channels with both deletions and insertions. For the special cases of deletion channels and duplication 
channels where previous results exist, our rates are very close to the best-known capacity lower bounds. 



1 Introduction 

Consider a binary input channel where for each bit (denoted x), the output is generated in one of the following 



• The bit is deleted with probability d, 

• An extra bit is inserted after x with probability i. The extra bit is equal to x (a duplication) with 
probability a, and equal to 1 — a; (a complementary insertion) with probability 1 — a, 

• No deletions or insertions occur, and the output is x with probability 1 — d — i. 

*This work was partially supported by NSF Grant CCF-1017744. It will be presented in part at the 2011 IEEE International 
Symposium on Information Theory. 



ways: 



The channel acts independently on each bit. We refer to this channel as the deletion+insertion channel with 
parameters (d, z, a). If the input to the channel is a sequence of n bits, the length of the output sequence will 
be close to n{l + i — d) for large n due to the law of large numbers. 

Channels with synchronization errors can be used to model timing mismatch in communication systems. 
Channels with deletions and insertions also occur in magnetic recording [l] . The problem of synchronization 
also appears in file backup and file sharing [2j|3j, where distributed nodes may have different versions of the 
same file which differ by a small number of edits. The edits may include deletions, insertions, and substitutions. 
The minimum communication rate required to synchronize the remote sources is closely related to the capacity 
of an associated synchronization channel. This connection is discussed at the end of this paper. 

The above model with i = corresponds to the deletion channel, which has been studied in several recent 
papers, e.g., [4 - 11 . When d = 0, we obtain the insertion channel with parameters {i, a). The insertion channel 
with a = 1 is the sticky channel 12 , where all insertions are duplications. 

In this paper, we obtain lower bounds on the capacity of the deletion+insertion channel. Our starting 
point is the result of Dobrushin 13 for general synchronization channels which states that the capacity is 
given by the maximum of the mutual information per bit between the input and output sequences. There are 
two challenges to computing the capacity through this characterization. The first is evaluating the mutual 
information, which is a difficult task because of the memory inherent in the joint distribution of the input and 
output sequences. The second challenge is to optimize the mutual information over all input distributions. 

In this work, we choose the input distribution to be the class of first-order Markov processes and focus 
on the problem of evaluating the mutual information. It is known that first-order Markov input distributions 
yield good capacity lower bounds for the deletion channel [4||5] and the sticky channel jl2 , both special cases 
of the deletion-|-insertion channel. This suggests they are likely to perform well on the deletion-|-insertion 
channel as well. But our approach is quite general, and can be extended to obtain capacity lower bounds 
with other input distributions such as higher order Markov distributions, and distributions with independent 
run-lengths. 

For a synchronization channel, it is useful to think of the input and output sequences in terms of runs of 
symbols rather than individual symbols. (The runs of a binary sequence are its alternating blocks of contiguous 
zeros and ones.) If there were a one-to-one correspondence between the runs of the input sequence X and those 
of the output sequence 1^, we could write the conditional distribution P(K|X) as a product distribution of 
run-length transformations; computing the mutual information would then be straightforward. Unfortunately, 
such a correspondence is not possible since deletions can lead to some runs being lost, and insertions to new 
runs being inserted. 

The main idea of the paper is to use auxiliary sequences which indicate the positions (in the output sequence) 
where runs were deleted and inserted. Consider a decoder that first decodes the auxiliary sequences, and then 
the transmitted codeword. Conditioned on the knowledge of the auxiliary sequences, the mutual information 
between the input and output sequences is a single-letter quantity which can be calculated easily. However, 
decoding the auxiliary sequences in addition to the codeword is sub-optimal and incurs a rate penalty, which 
we upper bound. The challenge is to define auxiliary sequences that lead to analytical lower bounds on the 
capacity with minimal rate penalty. 

To gain insight, we first consider the special cases of the insertion channel and the deletion channel sep- 
arately. The insertion channel with parameters (z, a) introduces approximately ni insertions in a sufficiently 
long input sequence of length n. A fraction nearly a of these insertions are duplications, and the rest are 
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complementary insertions. Note that new runs can only be introduced by complementary insertions. We 
consider a decoder that first decodes the positions of the complementary insertions. The decoder can then flip 
the bits at these positions to obtain a one-to-one correspondence between input and output runs. 

For the deletion channel, we use a decoder that first decodes an auxiliary sequence whose symbols indicate 
the number of runs deleted between each pair of adjacent bits in the output sequence. Augmenting the 
output sequence with the positions of deleted runs results in a one-to-one correspondence between input and 
output runs. For the deletion-|-insertion channel, we consider a decoder that decodes both auxiliary sequences 
described above. 

The main contributions of the paper are the following: 

1. Theorems [T] and [2] together provide the first characterization of achievable rates for the general insertion 
channel. For the special case of the 'sticky' channel {a = 1, i.e., only duplications), the rates of Theorem 
[2] are very close to the near-optimal lower bound given in (l2] . 

2. Theorem |4] provides the first characterization of achievable rates for the deletion-f insertion channel. For 
the special case of the deletion channel {i — 0), these rates are close to the best-known lower bounds 
given in [s]. 

3. Our approach provides a general framework to compute the capacity of channels with synchronization 
errors, and suggests several directions to obtain sharper capacity bounds. For example, results on the 



structure of optimal input distributions for these channels (in the spirit of 10 11 ) could be combined 
with our approach to improve the lower bounds. One could also obtain upper bounds on the capacity 
by assuming that the auxiliary sequences are available 'for free' at the decoder, as done in [i] for the 
deletion channel. 

For clarity, we only consider the binary deletion-|-insertion channel in this paper. The results presented here 
can be extended to channels with any finite alphabet. 



1.1 Related Work 

Dobrushin's capacity characterization was used in [6] to obtain bounds on the deletion capacity. The 'jigsaw' 
decoding approach of [j5 is interpreted in 6| in terms of an auxiliary sequence that associates each run of 
the output sequence with one or more runs of the input codeword. Using this, the mutual information is 
decomposed such that one part of it represents the rate achieved by the jigsaw decoder, and the other part 
represents the rate loss due to the jigsaw decoder. This rate loss is hard to compute, and is estimated via 
simulation for a few values of the deletion probability in |6]. 

Our approach is motivated by the observation that synchronizing the output with the transmitted sequence 
is difficult mainly due to runs being completely deleted and new runs being inserted. Accordingly, the auxiliary 
sequences we use indicate the positions of these runs in the output sequence, leading to a mutual information 
decomposition that is quite different from |6j. The approach of [5j[6] provides the best-known lower bounds 
on the deletion capacity, but is specific to channels with only deletions and duplications. Our techniques 
are general and apply to channels with both deletions and insertions, while giving bounds very close to the 
best-known ones for deletion and duplication channels. 

Dobrushin's capacity characterization was also used in [11 to estimate the deletion capacity and the 
structure of the optimal input distribution for small values of deletion probability. In 7j , a genie-aided decoder 
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with access to the locations of deleted runs was used to upper bound the deletion capacity using an equivalent 
discrete memoryless channel (DMC). In [9j, bounds on the deletion capacity were obtained by considering 
a decoder equipped with side-information specifying the number of output bits corresponding to successive 
blocks of L input bits, for any positive integer L. This new channel is equivalent to a DMC with an input 
alphabet of size 2^, whose capacity can be numerically computed using the Blahut-Arimoto algorithm (for as 
large a value of L as computationally feasible) . The upper bound in |9] is the best known for a wide range of 
deletion probabilities, but the lower bound is weaker than that of [S] and the one proposed here. Finally, we 



note that a different channel model with bit flips and synchronization errors was studied in 14 15 . In this 
model, an insertion is defined as an input bit being replaced by two random bits. We have only mentioned 
the papers that are closely related to the results of this work. The reader is referred to ^ for an exhaustive 
list of references on synchronization channels. 

After laying down the formal definitions and technical machinery in Section [2j we describe two coding 
schemes in Section |3] which give intuition about our bounding techniques. In Section [4] we consider the 
insertion channel (d = 0) and derive two lower bounds on its capacity. For this channel, previous bounds 
exist only for the special case of sticky channels [a — 1) [l2]. We derive a lower bound on the capacity of 
the deletion channel (i — 0) in Section [s] and compare it with the best known lower bound. In Section |6j we 
combine the ideas of Sections |4] and [5] to obtain a lower bound for the deletion+insertion channel. Section [7] 
concludes the paper with a discussion of open questions. 



2 Preliminaries 

Notation: Nq denotes the set of non-negative integers, and N the set of natural numbers. h[.) is the binary 
entropy function, and 1_4 is the indicator function of the set A. For any 0<a<l,a=l — a. Logarithms are 
with base 2, and entropy is measured in bits. We use uppercase letters to denote random variables, bold-face 
letters for random processes, and superscript notation to denote random vectors. Thus the channel input 
sequence of length n is denoted X" = {Xi, . . . ,Xn). The corresponding output sequence at the decoder has 
length Mn (a random variable determined by the channel realization), and is denoted Y'^'^". For brevity, we 
sometimes use underlined notation for random vectors when we do not need to be explicit about their length. 
Thus X ^ X" = (Xi, ^2, . . . , X„), and Y ^ Y^'^- = (Vi, . . . , YmJ. 

The communication over the channel is characterized by three random processes defined over the same 
probability space: the input process X = {X„}„>i, the output process Y = {F„}„>i, and M = {M„}„>i, 
where Mn is the number of output symbols corresponding to the first n input symbols. If the underlying 
probability space is ($7,7^, P), each realization cj G determines the sample paths X(a;) = {Ar„(a;)}„>i, 
Y{uj) = {r„(a;)}„>i, and M(cj) = {Af„(a;)}„>i. 

Definition 2.1. An (n, 2"^) code with block length n and rate R consists of 

1. An encoder mapping e : {1, . . . , 2"^} {0, 1}", and 

2. A decoder mapping g : E ^- {1, . . . , 2"-^} where E is U^^(,{0, 1}'= for the deletion channel, U^^„{0, l}*^ 
for the insertion channel, and U^" q{0, 1}'^ for the deletion+insertion channel. 

Assuming the message W is drawn uniformly on the set {1, . . . , 2"-'^}, the probability of error of a {n, 2"^) 
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code is 

1=1 

A rate R is achievable if there exists a sequence of (n, 2"^) codes such that Pe,„ as n oo . The supremum 
of ah achievable rates is the capacity C. The following characterization of capacity follows from a result proved 
for a general class of synchronization channels by Dobrushin |13 . 

Fact 1. Let C„ — maxp Then C = lini„^oo C*,, exists, and is equal to the capacity of the 

deletion+insertion channel. 

Proof. Dobrushin proved the following general result in 13 1. Consider a channel with X and y denoting the 
alphabets of possible symbols at the input and output, respectively. For each input symbol in A", the output 
belongs to y, the set of all finite sequences of elements of 3^, including the empty sequence. The channel is 
memoryless and is specified by the stochastic matrix {P{y\x), y & y,x £ X}. Also assume that for each input 
symbol x, the length of the (possibly empty) output sequence has non-zero expected value. Then lim„_>.oo Cn 
exists, and is equal the capacity of the channel. 

The deletion+insertion channel is a special case of the above model with X = y = {0, 1}, and the length 
of the output corresponding to any input symbol has a maximum value of two and expected value equal to 
(1 — d + i), which is non-zero for all d < 1. Hence the claim is a direct consequence of Dobrushin's result. □ 

In this paper, we fix the input process to be the class of binary symmetric first-order Markov processes 
and focus on evaluating the mutual information. This will give us a lower bound on the capacity. The input 
process X = {A'„}„>i is characterized by the following distribution for all n: 

n 

P(Xi, . . . ,X„) = P(Xi) J] P(X,|X,_i), 
i=2 

with 

P(Xi = 0) = P{Xi = 1) - 0.5, P{X, = = 1) = P{X, = 0|Xj_i - 0) = 7, J > 1. (1) 

A binary sequence may be represented by a sequence of positive integers representing the lengths of its runs, 
and the value of the first bit (to indicate whether the first run has zeros or ones). For example, the sequence 
0001100000 can be represented as (3, 2, 5) if we know that the first bit is 0. The value of the first bit of X can 
be communicated to the decoder with vanishing rate, and we will assume this has been done at the outset. 
Hence, denoting the length of the jth run of X by we have the following equivalence: X {L^ , P^, . . .). 
For a first-order Markov binary source of 0, the run-lengths are independent and geometrically distributed, 
i.e., 

Pr(Lf =r)=7'-i(l-7), r = l,2,... (2) 

The average length of a run in X is j^, so the number of runs in a sequence of length n is close to n{l — 7) 
for large n. Our bounding techniques aim to establish a one-to-one correspondence between input runs and 
output runs. The independence of run-lengths of X enables us to obtain analytical bounds on the capacity. 
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We denote by Ip{X^^; F*^"), i7p(X"), i/p(X"|y^^") the mutual information and entropies computed with the 
channel input sequence X" distributed as in 0. For all n, we have 

C„ = max - /(X" ; Y^'" ) > - /p (X" ; F ) . (3) 

Pxi n n 

Therefore 

C > liminf ^/p(X";y^^") = /i(7) - limsup ^ iIp(X"|y^^"). (4) 

n-foo n n-i-oo n 

We will derive upper bounds on limsup„^g^ ^i/p(X"|F*^") and use it in Q to obtain a lower bound on the 
capacity. 

2.1 Technical Lemmas 

To formally prove our results, we will use a framework similar to [6|. The notion of uniform integrability will 
play an important role, and we list the relevant definitions and technical lemmas below. 



Definition 2.2. 16' A family of random variables {Zn}n>i *s uniformly integrable if 

lim supE[|Z„|ln2 |>ai] = 0. 

Lemma 2.1. A family of random variables {Z„}„>i is uniformly integrable if and only if: 

1. sup„E[|Z„|] < oo, and 

2. For any e > 0, there exists some 5 > such that for all n and any event A with Pr{A) < 6, we have 
E[|Z„| 1^] <e. 

Let Supp(VK|Z) denote the random variable whose value is the size of the support of the conditional 
distribution of W given Z. 

Lemma 2.2. 16, Lemma 4/ Let {Wn,Zn}n>i be a sequence of pairs of discrete random variables with 
Supp{Wn\Zn) < c" for some constant c > 1. Then sup„ E (Mog Pr(W„|Z„))^ < oo. In particular, the 
sequence { — ^ logi-'r(VF„|Z„)}^^^^ is uniformly integrable. 

Lemma 2.3. ^16^ Suppose that {Z„ : n > 1} is a sequence of random variables that converges to Z in 
probability. Then the following are equivalent. 

L {Zn : n > 1} is uniformly integrable. 

2. E[|Z„|] < oo for all n, and E[|Z„|] E[|Z|] as n ^ oo. 

Lemma 2.4. Let Z = {Zn\n>i be a process for the asymptotic equipartition property (AEP) holds, i.e., 

lim logPr(Zi, . . . , Zn) H{Z) a.s. 

n— ^oo n 

where H{Z) is the (finite) entropy rate of the process Z. Let {M„}„>i be a sequence of positive integer valued 
random variables defined on the same probability space as the Zn s, and suppose that lim„_>.oo — x almost 
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surely for some constant x. Then 

lim - ^ log Pr{Zi,...,ZM^) = H{Z)x a.s. 

Proof. Fix any e > 0. Since lim„_>.oo = a;, there exists an L{e) such that for ah n > L(e), we have almost 
surely 

a(n, e) = [n(a; — e)] < M„ < [n(a; + e)] = e). 
It follows that for all n > L{e), 

logPr(Zi,...,ZAf„) > --logPr(Zi,...,Zo(„,<:)) = r . (5) 

n n ^ ' n a[n, e) 

Hence, 

lim inf - i log Pr(Z„ . . . , Zm„ ) > lim inf - • ^-S^^^^y ■ ^ ^ajn.)) 

- lim "("''^^ ~lQgP'^(^l^---^^a(".e)) (6) 



n—foo n n-yoo a(ri, e) 

= {x-e)H{Z). 



Similarly, one can show that 



limsup--logPr(Zi,...,ZMj < {x + e)H{Z). (7) 

n— >C30 n 



Since e > is arbitrary, combining ^ and ([t]), we get the result of the lemma. □ 

3 Coding Schemes 

In this section, we describe coding schemes to give intuition about the auxiliary sequences used to obtain the 
bounds. The discussion is informal; the capacity bounds are rigorously proved in the following sections using a 
different technique: the auxiliary sequences are used to directly bound the limiting behavior of 
using information-theoretic inequalities and elementary tools from analysis. 

3.1 Insertion Channel 

Consider the insertion channel with parameters (i, a). For < a < 1, the inserted bits may create new runs, 
so we cannot associate each run of Y_ with a run in X. For example, let 

X = 000111000 and y = Ooioill t^OOOt^, (8) 

where the inserted bits are indicated in large italics. There is one duplication (in the third run), and two 
complementary insertions (in the first and second runs). While a duplication never introduces a new run, 
a complementary insertion introduces a new run, except when it occurs at the end of a run of X_ (e.g., 
the inserted at the end of the second run in Q). For any input-pair (X",y^^"), define an auxiliary 
sequence T^^" — (Ti,...,Tm„) where Tj = 1 if Ij is a complementary insertion, and Tj = otherwise. 
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The sequence T " indicates the positions of the complementary insertions in Y ^ . In the example of (|8| , 
T^^n = (0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0). 

Consider the following coding scheme. Construct a codebook of 2"^ codewords of length n, each chosen 
independently according to the first-order Markov distribution^. Let X" denote the transmitted codeword, 
and F^'^" the channel output. From Y^'^", the decoder decodes (using joint typicality) the positions of the 
complementary insertions, in addition to the input sequence. The joint distribution of these sequences is 
determined by the input distribution ([l]) and the channel parameters {i, a). 

Such a decoder is sub-optimal since the complementary insertion pattern T^^" is not unique given an 
input-output pair (X" , y ) . This is discussed in Section ^ The maximum rate achievable by this decoder 
is obtained by analyzing the probability of error. Assuming all sequences satisfy the asymptotic equipartition 
property [l7], we have for sufficiently large n 

Pr(error) < 2"(^+«(^"" 1^")) • 2-"^(^"^"" ) . (9) 

The second term above is the probability that (X", T^^", Y^'^") are jointly typical when Y'^^" is picked inde- 
pendently from (X",T*^"). The first term is obtained by taking a union bound over all the codewords and all 
the typical complementary insertion patterns for each codeword. Hence the probability of error goes to zero if 

R<- (/(x"r^";r*^") - i7(r*^"|x")) = - - h{t^^-\y^^-) - i7(X"|r*=f",F^^")) . (lo) 

n n 

We obtain a lower bound on the capacity in Section [4.2| by obtaining good single- letter bounds on the limiting 
behavior of both iff(T*^n and i£f(X"|T*-f" , F*^-). 

3.2 Deletion Channel 

For the deletion channel with deletion probability d, consider the following pair of input and output sequences 
X_ = OOOIIIOOO, Y_ ~ 0010. For this pair, we can associate each run of Y_ uniquely with a run in X. Therefore, 
we can write 

P{Y = 0010|X = 000111000) = P(Lf = 2|Lf = 3)P{L^ = l|Lf = 3)P{L^ = l\Lf = 3) 

where L-j - ,Lj denote the lengths of the jth runs of X and Y, respectively. We observe that if no runs in X_ are 
completely deleted, then the conditional distribution of Y_ given X may be written as a product distribution 
of run-length transformations: 

PiY\X) = P{lX\L^)P{lI\L^)P{LI\L^) . . . (11) 

where for all runs j, P{Lj = s\L^ ~ r) = (^)(i''^*(l — d)" for 1 < s < r. In general, we do have runs of X 
that are completely deleted. For example, if X = 000111000 and F = 000, we cannot associate the single run 
in y uniquely with a run in X . 

For any input-output pair {X", F*^"), define an auxiliary sequence S^^"^^ — {Si, S2, ■ ■ ■ , Sm„+i), where 
Sj E No is the number of runs completely deleted in X" between the bits corresponding to i^-i and Yj. {Si is 
the number of runs deleted before the first output symbol Yi, and Sm,^+i is the number of runs deleted after 
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the last output symbol Ym„-) For example, if X = 00 01 11 00 and the bits shown in italics were deleted to 
give Y = 000, then 5 = (0, 0, 1, 0). On the other hand, if the last six bits were all deleted, i.e., X = 000 111000, 
then = (0, 0, 0, 2). Thus is not uniquely determined given (X, F). The auxiliary sequence 5 enables us to 
augment Y_ with the positions of missing runs. As will be explained in Section [5j the runs of this augmented 
output sequence are in one-to-one correspondence with the runs of the input sequence. 

Consider the following coding scheme. Construct a codebook of 2"^ codewords of length n, each chosen 
independently according to ([T]). The decoder receives Y^", and decodes (using joint typicality) both the 
auxiliary sequence and the input sequence. Such a decoder is sub-optimal since the auxiliary sequence 5"*^"+^ 
is not unique given a codeword X" and the output y^^'>. Assuming all sequences satisfy the asymptotic 
equipartition property, we have for sufficiently large n 

Pr(error) < 2"(«+^(^""+'l^")) • 2-"^(^"S"'"+^^""). (12) 

The second term above is the probability that {X"", S^"^^ ,Y^") are jointly typical when Y^" is picked 
independently from (X", 5'^^"+-'^). The first term is obtained by taking a union bound over all the codewords 
and all the typical auxiliary sequences for each codeword. Hence the probability of error goes to zero if 

R<- (/(A"5^^"+i;r^^") - i7(S'^^"+i|X")) = - (iJ(X") - h(S'^'^-+^\Y'^'-) - H(X'''\S^'"+\Y^^")) . 
n n 

(13) 

In Section [5] we show that the above expression converges as n — > cxj, and obtain an analytical expression for 
the limit. 

For the deletion-|-insertion channel, we use both the auxiliary sequences, T^^^ and 5*^"+^. The decoder 
decodes both these sequences in addition to the codeword X", and the maximum achievable rate is given by 

R< -(mx")-mr^",^^"+My^")-ij(x"|5^"+i,T^",y^")). (w) 

n 

We obtain a lower bound on the capacity of the deletion-|-insertion channel in Section [6] by analyzing the 



limiting behavior of ( 14 ) 



4 Insertion Channel 

In this channel, an extra bit may be inserted after each bit of X. with probability i € (0, 1). When a bit 
is inserted after Xj, the inserted bit is equal to Xj (a duplication) with probability a, and equal to Xj (a 
complementary insertion) with probability 1 — a. When a — 1, we have only duplications - this is the sticky 
channel studied in [12| . In this case, we can associate each run of Y_ with a unique run in A, which leads to 
a computable single-letter characterization of the best achievable rates with a first-order Markov distribution. 
We derive two lower bounds on the capacity of the insertion channel, each using a different auxiliary sequence. 



4.1 Lower Bound 1 

For any input-pair (A",F*^"), define an auxiliary sequence /*^" = (/i,...,/m„) where Ij = 1 if Y,- is an 
inserted bit, and Ij ~ otherwise. The sequence indicates the positions of all the inserted bits in y*^". 
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and is not unique for a given Using we can decompose iJp(X"|y*^") as 

since /^^") = 0. Therefore, 

linisup -i?p(X"|r*^") = limsup - (iJp(/*^" - Hp(/*^" F*'^")) 

n— ^oo n— >-oo 

< limsup -Hp{I^^" - liminf -Hp{I^'" F*^"). 

n— i-oo JT- n— i-oo 71 



(15) 



We will derive an upper bound on limsup ;l-i/p(/*^" jF^^"), and a lower bound on liminf ;l-i7p(/^^"|X",y*^"). 
Using this in (151, we get an upper bound for limsup ^i/p(X"|y^^"), which can then be used in Q. 

Proposition 4.1. The process {I, Y} = {(/i, Yi), {I2, Y2), . . .} is a second-order Markov process characterized 
by the following joint distribution for all m G N; 

m 

p(/'", y™) = p{h,Y,)P{h,Y2\h,Y^) n P(/,, y,_2) 

where for x,y ^ {0, 1} anrf j > 3; 
P{I, = 1, = = 0, yj_i = y, yj_2 = .x) = la, P{I, = 1, y, = = 0, y,_i = y, y,_2 = .x) = la 

Piij = 0, Yj = = 0, yj_i = y, y,_2 - a;) - h, P{i, = 0, y, = = 0, y,_i = y, y,_2 = = 17 

P(/, = 0, y, = = 1, y,_i = y, yj_2 = a:) = 7, P(/, = 0, Y, = = 1, y,_i = y, yj_2 - a;) = 7 

(16) 

Proof. We need to show that for all j > 3, the following Markov relation holds: {Ij,Yj) — {Ij-i, y,_i, y,_2) — 
{P-^,Y^-^). First consider P{Ij,Yj\Ij^i = 0,Yj^i = y,p-^,Y3-'^). Since Ij-i = 0, y^-i is the most recent 
input bit (say Xa) before Yj. P{Ij = 0,Yj = = 0, y,_i = y,P~^,Y^-^) is the probability that the 

following independent events both occur: 1) the input bit Xa+i equals Xa, and 2) there was no insertion after 
input bit Xa- Since the insertion process is i.i.d and is independent of the first-order Markov input process X, 
we have 

P(/, = 0,y, = = 0,yj_i = y,p-^,Y^-^) = (1 - i)j. 

Similarly, we obtain 



P{I, 


-o,y, 


= y\i]-i 


-o,y,_i 


= y,p- 


-2,y-'"-2) 


= (1-0(1-7), 


Pih 




= v\i,-i 


= o,y,_i 


= y,p- 


-2,yj"-2) 


= ZQ, 




= i,y, 




= o,y,_i 


= y,p- 


-2,yj"-2) 


= i(l — a). 



Next consider P(/j, yj|/j_i = l.Yj^i = y,y,-2 = a;, P^^, y-'^'^). Since = 1, y,_2 is the most recent 
input bit (say, Xa) before Yj. Also note that Yj is the input bit Xa+i since y^-i is an insertion. (At most one 
insertion can occur after each input bit.) Hence P{Ij = 0,y,- = x\Ij-i — l,y,-i = y,Yj^2 — x,I^^^,Y^^^) is 
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just the probability that Xa+i ~ Xa, which is equal to 7. Similarly, 

p{i, = 0, Y, = = 1, = y, y,_2 = X, = 1-7. 

□ 



Remark: Proposition 4.1 implies that the process {I,Y} can be characterized as a Markov chain with 
state at time j given by (Jj , Yj , Yj_i). This is an aperiodic, irreducible Markov chain. Hence a stationary 
distribution tt exists, which can be verified to be 



n{Ij = 1, Yj = y, Yj_i = y) = . , 7r(/j = 1, Y^ = y, Yj_i = y) 



(17) 



2(1 + ^ ■' ' ■' 2(1 + z) 

, , i7 + ia7 + m7 _ i7 + m7 + ia7 

.(/, = 0, Y, = y, = y) = , -(/. = 0, Y, = y, Y,^, = y) = ^^^^^^ 

for y e {0, 1}. 

Lemma 4.1. limsup„^^ lHp{I^'-\Y^'-) ^ (1 + z) limsup,„^^ J_i/p(/m|y™). 

Proof. See Appendix |A.1[ □ 
Lemma 4.2. limsup,„^^ iHp(/'"|y™) < lim^-^oo i?p(/j|/j-i, ^S", ^j-i, i"j-2), and 

hm ffp /, /,_i,y„F,_i,Y,_2 = ^ ^ J' h — ^ + .^' /i — ^ . 18 

'j^xj -^-^ l + i \za + i7/ 1 + i \za + i7y 

Proof. See Appendix |A.2[ □ 

Next, we focus on H{I^'^^ |yAf„ ^ ^"^^ which is the uncertainty in the positions of the insertions given both 
the channel input and output sequences. For example, given input X = 1 0, and output y = 00011 0, 
we know that there is either a complementary insertion after the third bit of X or a duplication after the 
fourth bit; so there is uncertainty in the values of /4 and /5 (one of them is zero, and the other is one.). But 
there is no uncertainty in Ii, I2, 13, Iq, which are all zero. We use this intuition to obtain a lower bound on 
the limiting behavior of ii/(/*^" , X"). 

Lemma 4.3. liminf„^oo ^i?p(/*^" , X") = liminf„^oo ^i?p(/"(^+')|r"(i+*),X"). 

Proof. The proof of this lemma is similar to that of Lemma |4.1[ and is omitted. □ 

Lemma 4.4. liminf„^oo ^i?p(/"^^+*^|>'"^^+'\X") > ^^i{a + ia)h (^^f-^ . 

Proof See Appendix |A.3[ □ 
Theorem 1. (LB I) The capacity of the insertion channel with parameters (i,a) can be lower bounded as 

C{i, a) > max ^1(7) — {ia + 17)^1 ( — ) — {ia + i'y)h ( — — ) + 7^1(0; + ia)h 



o<7<i \ia + ij J \ia + i"f J \a + la 



Proof. The result is obtained by using Lemmas 4.1 4.2 4.3 and 4.4 in (15), and substituting the resulting 
bound in Q. We optimize the lower bound by maximizing over the Markov parameter 7 S (0, 1). □ 
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4.2 Lower Bound 2 

For any input-pair (X", y*^"), define an auxiliary sequence T*^" = (Ti, . . . , Tm^) where Tj = 1 if is a com- 
plementary insertion, and Tj — otlierwise. The sequence T*^" indicates the positions of the complementary 
insertions in Y^^" . Note that T*^" is different from the sequence /*^" , which indicates the positions of all the 
insertions. Using T*^", we can decompose i/p(X"|F*^") as 

< Hp (T^^" I r ) + ijp (X" I y ) - i/p (T*-^- 1 X" , y ) , 

where y*^" is the sequence obtained from (T^'^"^ ,Y^^") by flipping y,- whenever Tj ~ 1, for 1 < j < M„. 
(a) holds in ^ because f*''" is a function of (T*^- , y ) , and hence iJp(X"|T^^",y*^") < i/(X"|y*^"). 
Therefore, we have 

lim sup - i7p (X" I y*^" ) < lim sup - (Hp (T*^" | y ) + i7p (X" | f ) - i/p (T*^" | X" , y ) ) . (20) 



We will show that lim„_j.oo ^H{X"\Y^^") exists, and therefore (20) becomes 

limsup - iip(x" I y*^") < limsup- (i/p(r^^"|y^^") - ijp(r*^"|x",y*^")) + i™ -ffp(x"|y^" 

< lim sup - Hp (T*^" I y ) - lim inf - Hp (T*^" | X" , y ) + lim - ffp (X" | f 

ji->-oo ?2 rn-oo n n^oa n 



(21) 



We will use (21 ) in ^ to obtain a lower bound on the insertion capacity. 
Lemma 4.5. limsup„^^ ^HpiT"'-\Y^''^) = (1 + z) limsup^^^ iiJp(r"|y™). 

Proof. The proof of this lemma is identical to that of Lemma 4.1 and can be obtained T*^" replacing . □ 
Lemma 4.6. lim sup„^^ iffpCT^ly") < lim,^,, i/p(r, iT^^i, y,, y,_i), and 

lim i7p(r,|T,_i,y„y,_i) = ~ ^ 7'"^ ^ ( 1 ■ (22) 

Proo/. See Appendix |A.4[ □ 

We now determine the limiting behavior of ii7(X"|y*^"). Recall that y^" is obtained by flipping the 
complementary insertions in y*^" . In other words, y^^™ has insertions in the same locations as y*^" , but the 
insertions are all duplications. Hence y*^" has the same number of runs as X". Recall from Section [i] that 
we can represent both binary sequences in terms of their run-lengths as 

X" o (if, . . . , Lfj, y*^" o (if, . . . , 4j, 

where i?„, the number of runs in X" (and y") is a random variable. Therefore, for all n, we have 

i/p(X"|y*^") = ifp(Lf , . . . , |Lf, . . . , 4j- (23) 
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Proposition 4.2. The process {L-^, L^} = {{Lf , LY), {L^ , L^), ■ ■ ■} is an i.i.d process characterized by the 
following joint distribution for all j > 1: 

P{Lf = r, Lj = s)= 7^-1(1 - 0''"', r = 1, 2, . . . , r < s < 2r. 

Proof. Since X is a Markov process, {L^}j>i are independent with 

P{Lf ^r)=Y-\l -l):r = 1,2,... 

is generated from X" by independently duplicating each bit with probability i. Hence Lj can be thought 
of being obtained by passing a run of length through a discrete memoryless channel with transition 
probability 



r 



P(L; = s\Lf = r) = (^^ _ rj*'""^^ " ' ^ ^ * ^ ^r. 

□ 

Lemma 4.7. lim„_j.oo ■^Hp{X^\Y^'''^) = {1 — j)Hp{L^ \LY ) , where the joint distribution of {L^ , L\) is given 
by Proposition \4-S\ 

Proof. See Appendix |A.5[ □ 

Finally, we need to analyze i/(T*^" , X"), the uncertainty in the positions of the complementary 
insertions given both the channel input and output sequences. For example, given input X = 1 0, and 
output y = 00011 0, we know that there is either a complementary insertion after the third bit of X 
or a duplication after the fourth bit; so there is uncertainty in the value of T4. There is no uncertainty in 
Ti, T2, T3, T5, Tg, which are all zero. We use this intuition to obtain a lower bound on the limiting behavior of 

iF(T^"|y^",x"). 

Lemma 4.8. liminf„^oo ^iJp(T*"^"|r*^",X") = liminf„^co iHp(r"(i+*) , X") . 

Proof. The proof of this lemma is similar to that of Lemma |4.1[ and is omitted. □ 

Lemma 4.9. liminf„^oo iiJp(r"(i+*) X") > ^'^i{a + ia)h (-fj^) • 

Proof. See Appendix |A.6[ □ 

Theorem 2. (LB 2) The capacity of the insertion channel with parameters (i,a) can be lower bounded as 

C(i,a)> max hM — + jia) h ( -] — JH{Lxl\LY^) + j'^i(a + ia)h ( 

o<7<i \7 + 7«a/ \i 

where H{Lxi\LyY) computed using the joint distribution given in Proposition 4--2 



Proof. The result is a direct consequence of using Lemmas 4.5 - 4.9 in (21), and substituting the resulting 
upper bound for limsup„_^^ liJ(X"|r*^") in (|4|. □ 

Figure [1] compares Lower bound 1 with Lower bound 2 for different values of i with a fixed at 0.8. We 
observe that LB 2 is generally a better bound that LB 1, except when i is large. For large i, it is more efficient 
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Figure 1: Comparison of the two lower bounds for different values of i with a = 0.8 
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to decode the positions of all the insertions (since i is large) rather than just the complementary insertions. 



Specifically, comparing Lemmas 4.2 and |4.6| 



lim i?(/,|Vi,y,,y,_i,r,_2) < lim ii(T,|r,-_i,r,,r,_i) 



for large values of i. Combining the bounds of Theorems [T] and [2j we observe that max{LB \,LB 2} is a 
lower bound to the insertion capacity. This is plotted in Figure [2] for various values of i, for a — 0.5,0.8,1. 



For Q = 1, the bound is very close to the near-optimal lower bound in 12 . The gap occurs because we have 



used a Markov input distribution instead of the numerically optimized input distribution in 12 . 



5 Deletion Channel 

In this channel, each input bit is deleted with probability d, or retained with probability 1 — d. For any 
input-output pair define the auxiliary sequence 5"*^"+^, where Sj G Nq is the number of runs 

completely deleted in X" between the bits corresponding to i^-i and Yj. {Si is the number of runs deleted 
before the first output symbol Yi, and 5m„+i is the number of runs deleted after the last output symbol Ym„-) 
Examples of S^^ for the input-output pair (X = 000111000, Y_ = 000) were given in Section I 



3.2 



The auxiliary sequence S_ enables us to augment Y_ with the positions of missing runs. Consider X_ = 
000111000. If the decoder was given F = 000 and S_ = (0,0,0,2), it can form the augmented sequence 

y' = 000 , where a — denotes a missing run, or equivalently a 'run of length 0' in With the "— " markers 

indicating deleted runs, we can associate each run of the augmented sequence uniquely with a run in X. 
Denote by Lr[ , , . . . the run- lengths of the augmented sequence y', where = if the run is a — . Then 
we have 

P{X, r) = P{Lf)P{LX' |if ) • P{L^)P{Ll' |Lf ) . . . (24) 



where Vj: 



P(Lf =r)=7'-i(l-7), r = l,2. 



P{L] = s\Lf = r)=[ ^ ]d'-'{l - d)^ < s < r. 



(25) 



Using the auxiliary sequence 5**^"+^, we can decompose _ffp(X"|y*^") as 

iJp(X"|r*'^") = Hp{X'\ s-A-f^+ilyA^") - i/p(S'*^"+i|X", r^'^"). (26) 

We therefore have 

limsup -i7p(X"|r^^") < limsup -Hp{X'^, 5Af„+i|yAf„) ^27) 

We will show that lim„_>oo ^i?p(X", 5"*^"+^ jF*-'^") exists, and obtain an analytical expression for this limit. 
Using this in (|4|, we obtain a lower bound on the deletion capacity. We remark that it has been shown in [6] 
that for any input distribution with independent runs, lim„_i.oo ^i?(X"|y*^") exists for the deletion channel. 



Hence the lim sup on the left hand side of ( 27 ) is actually a limit 



Proposition 5.1. The process Y = {Yi, Y2, . . .} is a first-order Markov process characterized by the following 
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joint distribution for all m ^N. 

m 

P{Yn^P{Y^)l[P{Y,\Y,_^) 

i=2 



where for y G {0, 1} 

P{Y, =y)= 0.5, P{Y, = y|F,_i = y) = I - P{Y, = y|F,_i = y) = ] + (28) 



Proof. The proof of this lemma can be found in 1 8 . 

Proposition 5.2. The process {S, Y} ^ {(5*1, Yi), (6*2,^2); ■ • •} a first-order Markov process characterized 
by the following joint distribution for all m G N; 



p(^™, y™) = F(ri, ^1) n l^.-i): 

J=2 



where for y G {0, 1} and j > 2: 



fc = 



7(l-rf) 
(l-7rf)' 

P(y,^y,5,^fc|y,_i=y) = <j (^)', fc = l,3,... (29) 

0, otherwise 



(l-d)(l-7) (^d(l-7)^^ 



p(y, = 5, = fc|y,_i = = (i-^-i)^ ^ ^'^'--^ (30) 

0, otherwise 
Proof. We need to show that 

PiYj ^y,Sj = k\Yj^i =yj_i,S'j_i =Sj_i,yj_2 =yj-2,Sj-2 = Sj-2, • ■ •) = -P(yj = 2/, -Sj = A:|y,-i =yj-i), 

for all ?/, j/j_i,?;j_2,--- G {0, 1} and fc, Sj_i, . . . G Nq. 

Let the output symbols y, , y,_i, yj_2, • • • correspond to input symbols Xa^, Xa-_-i^, Xa-_j, ■ ■ ■ for some 
positive integers Oj > flj-i > o,j-2 > ■ ■ ■■ Sj^i is the number of runs between the input symbols Xa-_2 and 
Xaj^n not counting the runs containing Xaj_2 and Xaj_-i^. Similarly, Sj-2 is the number of runs between the 
input symbols Xaj_^ and Xaj_2, not counting the runs containing Xaj_s and Xaj_2 etc. 

First consider the case where Yj — y,-i = y. When Yj — Xa- = y and yj-i = Xa-_i — y, note that Sj, 
the number of completely deleted runs between ^aj_i and Xa^, is either zero or an odd number. We have 



p(y, = y, s, = o|y,_i = y, = y,_2 = y,-2,s,^2 - s,_2, . . .) = E ^"(i " " ^ - (i „ 



id) 
(31) 



where (a) is obtained as follows. 7™(1 — 7) is the probability that the input run containing Xaj_i contains m 
bits after a^-i, and (1 — d™) is the probability that at least one of them is not deleted. This needs to hold 
for some m > 1 in order to have Sj = and Yj = Yj — 1. By reasoning similar to the above, we have for 
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k — 1 , 3 J 5 , . 



-1, 5^j-2 - yj-2- Sj-2 - ■ • ■) 



(b) 



\m— / \m— 1 / \m— 1 

(l-7)(l-rf) [ 0^(1-7) ^' 
(l-7d)2 [(l-7d) 



(32) 



where the first term in (6) is the probability that the remainder of the run containing Xaj_-^ is completely 
deleted, the second term is the probability that the next k runs are deleted, and the last term is the probability 
that the subsequent run is not completely deleted. 

When Yj — y and i^-i — y, the number of deleted runs Sj is either zero or an even number. For 
fc = 0, 2, 4, . . . we have 

P{Y] = y,Sj = fc|y,_i = y,Sj-i = Sj_i, Y,_2 = yj^2,Sj-2 = Sj-2, ■ ■ .) 



\m—l 



(l-7)(l-^) 
(1 - 7d)2 



djl-l) 
(1 - 7d) 



m— 1 



V^m— 



(33) 



In the above, the first term in (c) is the probability that the remainder of the run containing ^aj _i is completely 
deleted, the second term is the probability that the next k runs are deleted (fc may be equal to zero), and the 
third term is the probability that the subsequent run is not completely deleted. This completes the proof of 
the lemma. □ 

We now show that lim„^oo ^Hp{S^'^"+^\Y^^^) and lim„^oo ^iJp(X"|r*^", S'*^"+i) each exist, thereby 
proving the existence of lim„^oo ^Hp{X"-, S^'^^+'^\Y^-^^). 

Lemma 5.1. lim„^oo ^Hp{S^'^^+^\Y^^^) = (1 - d)Hp{S2\YiY2) where the joint distribution of{Yi,Y2,S2) is 
given by and ([30). 

Proof. See Appendix |B.1| □ 

To determine the limiting behavior of (X"|S'*^"+^, y*^"), we recall that X" can be equivalently repre- 
sented in terms of its run- lengths as {L-^ , . . . , i^^), where i?„, the number of runs in X", is a random variable. 
Also recall from the discussion at the beginning of this section that the pair of sequences (5^^"+^,!^*^") is 
equivalent to an augmented sequence y' formed by adding the positions of the deleted runs to y = y*^" . 
y' can be equivalently represented in terms of its run-lengths as {L^ , 
L\ , L\ , . . . can take value as well. To summarize, we have 



Lpi^), where we emphasize that 



a:" o (Lf 



X 



Y' 



(34) 



Thus, for all 



Hp{X''\S 



Y-'-) = Hp{L^,...,L-jLl 



'X \tY' 



(35) 
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Proposition 5.3. The process {L-'^jL'^ } = {{L^ , ), {L2 )j • ■ •} *s 0,1^ i-i-d process characterized by 
the following joint distribution for all j > 1: 

P{Lf = r,Lj' =s)=Y-\l-l)-Qd^-^l~dy, r = 1, 2, . . . , < s < r. (36) 

Proof. Since X is a Markov process, {Lj'"}j>i are independent with 

P{Lf ^r)=Y-\l -l):r = 1,2,... 

Since the deletion process is i.i.d, each Lj can be thought of being obtained by passing a run of length 
through a discrete memoryless channel with transition probability 



P{Lj' = s\Lf = r) = ( ' ]d''-%l - dy, < s < r. 



□ 

Lemma 5.2. lini„^oo ii?p(X"|5*^"+\ y^^") = {l--f)Hp{L^\L^') where the joint distribution of{L^,L^') 
is given by (36 1. 

Proof See Appendix |B]2l □ 

Using Lemmas |5.1| and |5.2[ we obtain the following lower bound on the capacity of the deletion channel. 
Theorem 3. The deletion channel capacity C{d) can be lower bounded as 

C{d) > max hi-f) - (1 - d)H{S2\YiY2) ~ {1 ^ j)H{L^\L^') 

0<7<1 



where 



HiS2\Y,Y,) ^ 7^"log, I + f log, I + ^ log, I + ^ log, |, (37) 

^2 + d-2jd {l-j)d (l-7)(l-rf) 

^ l + d-2-fd' l-7d ' ' (l-7d)2 



1^ ^ - I ^ " (1 - 7d)2 ) ^°S2 ^ + (1 _ 7(1 -7d) 

I V^r J^^^ rw^v Z"^' + In.. f^ + ''' 



-^EE('^»'(^7)M^; log., ^ 

' k=lj=l \ / \ 

Proof. Combining Q and (27 1, we obtain 



(38) 



C{d) > hij) - limsup ''^ ' ^ ^. (39) 
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0.3 0.4 0.5 0.6 
Deletion Prob. d 



Figure 3: Lower bound of Theorem [S] on the deletion capacity. The lower bound from ^ is shown in dashed 
lines. 



From Lemmas |5.1| and |5.2[ we have 

lim -i/p(X",5^^"+i|F^") = lim -Hp{S^-+^\Y^-) + lim ^HpiX"\S^-+\Y 



M„ 



n— i-oo n 



= (1 - d)H{S2\Y,Y2) + (1 - 7)i/(i^|i^'). 



(40) 



H{S2\YiY2) can then be computed using the joint distribution given by (28), (29), and (30). H{L^\L^ ) can 
be computed using the joint distribution given in Proposition |5.3| Finally, we optimize the lower bound by 
maximizing over the Markov parameter 7 e (0, 1). □ 

Figure [s] shows the lower bound of Theorem [s] as well as the lower bound of for various values of d. We 
observe that our bound is close to, but slightly smaller than that of [s], which is the best known lower bound 
on the deletion capacity. 

In the decomposition of i7p(X"|y*^") in ([26]), we dropped the term iJp(S'^^"+i|X", F*^") to obtain the 



bound in (27). iJp(5*^"+^|X", F*^") is the uncertainty in the positions of the deleted runs given both the 



input and output sequences. For example, if X" ~ 001100 and y*^" = QOO, then 5'*^"+-'^ equals 

• (0, 1, 0, 0) if the deletion pattern is either OOilOOorfOiiOO (bits in italics are deleted). 

• (0, 0, 1, 0) if the deletion pattern is either Q {) 1 1 oy Q 1 1 {) 0. 

Computing lim„_j.oo -H^p('S'^^"^^|^", i^*^") precisely is hard, but obtaining a non-trivial lower bound for this 
quantity would improve the lower bound of Theorem [3) 
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Deletion channel 




Insertion channel 


► 





with prob. d 


► 


with prob. i' = 





Y 



M„ 



Figure 4: Cascade channel equivalent to the deletion+insertion channel. 

6 The Deletion+insertion Channel 

Recall that this channel is defined by three parameters {d,i,a). Each input bit undergoes a deletion with 
probability d, a duplication with probability, ia, a complementary insertion with probability ia. Note that 
each input bit is deleted with probability d; given that a particular bit is not deleted, the probability that it 
undergoes an insertion is jziR- Therefore, one can think of the channel as a cascade of two channels, as shown 
in Figure [4] The first channel is a deletion channel that deletes each bit independently with probability d. 
The second channel is an insertion channel with parameters where i' = jz^. We prove the equivalence 

of this cascade decomposition below. 

Claim: The deletion+insertion channel is thus equivalent to the cascade channel in the sense that both 
have the same transition probability P{Y\X). 

Proof. For an 7i-bit input sequence, define the deletion- insertion pattern A" = (Ai, A2, . . . , A„) of the channel 
as the sequence where A^ indicates whether the channel introduces a deletion/duplication/complementary 
insertion/no modification in bit i of the input. Note that if the underlying probability space is P), 
the realization a; € O determines the deletion- insertion pattern A"(ci;). We calculate the probability of any 
specified pattern occurring in a)the deletion-|-insertion channel, and b)the cascade channel. 

Consider a deletion-insertion pattern A" with k deletions at positions ai, 02, . . . , a*;, I duplications at po- 
sitions bi, . . . ,bi, and m complementary insertions at positions ci , . . . , Cm . The probability of this pattern 
occurring in the deletion+insertion channel is 

Prfeh„.(A"H = A") = d'=M'M'"(i -d- 

The probability of this pattern occurring in the cascade channel of Figure |4] is 

^{d^{\~dY'-^\ 



,(A"(a;) = A") = [rf'^(l - 



(^'a)'(^'a)"(l-^') 

ia \^ ( ia 
1-d) \l-d 

rf'=(ia)'(ia)™(l -d- i)"-^-'-™. 



1-d- 



1 - d 



n — k — l~m 



(41) 



where the first term in (a) is the probability of deletions occurring in the specified positions in the first 
channel, and the second term is the probability of the insertions occurring in the specified positions in the 
second channel. Hence every deletion-insertion pattern has the same probability in both the deletion+insertion 
channel and the cascade channel. This implies that the two channels have the same transition probability. □ 

To obtain a lower bound on the capacity, we use two auxiliary sequences T*^" — {Ti , . . . , Tm^ ) , and 
^Af„+i _ ^g^^ _ _ ^ Sm^+i)- As in Section 4.1 T^^" indicates the complementary insertions in y*^": Tj = 1 if 
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„ + l yA-f„ 


)-Hp 


(yA/„ 






+ l]rpM, 


.^yAf 






+ l]rpM, 


.^yAf 



Yj is a complementary insertion, and Tj — otherwise. As in Section [sj 5**^"+^ indicates the positions of the 
missing runs: Sj — k.iik runs were completely deleted between Y,_i and Yj. Using these auxiliary sequences, 
we can decompose i?p(X"|F*^") as 

^A/„ 

Hp{x"\s^'"+\T^'-,Y^'-) - ifp(r*^",5*^"+i|x",r*'^") 

Hp (X" 1 5*^" +\Y'^^^-)-Hp (T*^" , S**^" +1 1 X" , Y^'" ) 

(42) 

where Y^" is the sequence formed by flipping the complementary insertions in Y^'^" . The inequality in the 
last line of ( 42 ) holds because Y " is a function of (T^^- , F ) . We therefore have 

limsup - i?p(x" I y*^") < limsup -(i7p(T*-f" + i^p(s•^^"+l|^^^^r^^'0 + i?p(x"|5*^"+\f*^^ 

n— J-cxD ^ n— >-oo ^ 

i/p(TA^"|y*^") , i/p(5A/„ + l|yAf yAf„) i7p(X"|5*^"+l,f*^") 

< lim sup h Imi sup h lim sup . 

n— foo n— foo n— ^oo ^ 



(43) 



Using this upper bound for lim sup„_j.oQ — in (W|) , we obtain a lower bound on the capacity of the 

deletion+insertion channel. 

Lemma 6.1. limsup„^^ iiJp(r*^"|i"*'") = (1 - ^ + limsup„^^ ^Hp{T"^\Y-^) . 



Proof. The proof follows the same steps as that of Lemma 4.1 with two changes: T*-'^" replaces J^", and we 
note that ^ converges almost surely to (1 — rf + i) for the deletion+insertion channel. □ 

Lemma 6.2. limsup,„^^ ii/p(r"|r"') < hnij^^ Hp{T,\Tj^i,Yj,Yj^i), where 

lim Fp(r,|T,_i,y„y,_i) = (^l±^h ( -'^ \ , andq= ^^^4^. 



Proof. We have 



Hp{T^\Y"^) ^ Y.7=iHp{T,\T^-\Y^) ^ ET=i HpiT,\T,.„Y„Y,_,) 
m m ~ m 



Therefore 



r Hp(r"|y'") ^ Y.7=iHp{T,\T,^,,Y,,Y,^i) 

limsup < limsup — — lim Hp[lj\lj^i,Yj,Yj^i), (45) 

provided the limit exists. From the cascade representation in Figure|4j we see that the insertions arc introduced 
by the second channel in the cascade, an insertion channel with parameters (?', a). The input to this insertion 
channel is a process Z = {^m}m>i, which is the output of the first channel in the cascade. From Propositon 



5.1 



Z is a first-order Markov process with parameter q = X+'d-2jd ■ 



Therefore, we need to calculate limj^cxi H{Tj\Tj^i,Yj ,Yj-i) where Y is the output when a first-order 
Markov process with parameter q is transmitted through an insertion channel with parameters («', a). But we 
have already computed linij^oo H{Tj\Tj^i,Yj,Yj^i) in Lemma 4.6 for an insertion channel with parameters 
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{i,a) with a first-order Markov input with parameter 7. Hence, in Lemma 4.6 we can replace 7 by q, and i 
by z' to obtain 

Hm i/p(T,|r,_i,r„r,_o = (l^l±^h ( ] . 

]-^oo -^V jl J 1, J, J {l + i') \l-q + qi'a) 

Substituting i' = simphfying gives the statement of the lemma. □ 

Lemma 6.3. limsup„^^ 1Hp{S'''-+^\T^'- ,Y^'-) = (1 - d + z) limsup,„_^ iffp(5™|T™, F^). 



Proof. The proof is along the same lines as that of Lemma |4.1[ here we use the uniform integrability of the 
sequence {-^ logP(5*^"+i|r*^", F*^")} along with the fact that ^ {1-d + i) almost surely. The uniform 
integrability follows from Lemma 2.2 since Supp(S'*^"+^|T*^",y*^") is upper bounded by 2" for the reasons 
explained in Section p3.1| □ 



Lemma 6.4. limsup,„^^ iffp(5™|T™, y™) < lim.^oo i?p(5, Fj, T,) = ^(^1+^2 - logj 9), 
where 

^ » ^7 + d-27rf {l~l)d (l-7)(l-rf) 

* l+d-27d' l-7d' ^ (l-7d)2 ' 

0B{l-i'a)^ f i'a + (l-i'a)q + i'aq\ f i'a + (1 - i'a)q + i'aq 



+ ((1 — i'a)j9 + i'a/S + i'a) log 



i'a + (1 — i'a)q + i' aq 
i'a + (1 - i'a)-^e + i'aji 



9^ 8(1 -i'a), f(l-i'a)q + i'aq\ 9Bi'a, f (1 - i'a)q + i'aq 
- 1-9^ [ f^il-m) ) + 1^ '"^^ [ Wa 

+ (/.7^ + (l-/a)/3)log,( ;^^-^^;_^,^^^ j. 
_Proo/. See Appendix [CT] □ 

We now determine the limiting behavior of ^H{X"-\S^'^"'^^ ,Y^"). By flipping the complementary inser- 
tions in to obtain Y'^^", we have removed the extra runs introduced by the channel. Using 5'*'^''+-'^, we 
can augment by adding the positions of the deleted runs to obtain a sequence F'^" which contains the 
same number of runs as X" . y can be represented in terms of its run- lengths as (LY , ■ ■ ■ , ip ) , where we 
emphasize that LY , , ■ ■ ■ can take value as well. To summarize, we have 



X"o(Lf,...,Ll), 

Thus, for all n 



(46) 



Hp{X-\S'''-+\Y''-) = Hp{Lf, . . .,L^JlY\ . . .,L^nJ. (47) 

Proposition 6.1. The process {L-'^jL'^ } = {{L^ , LY ), {L2 , ),•■•} is an i.i.d process characterized by 
the following joint distribution for all j > 1: 

P{Lf ^r)^Y-\i~l), r = l,2,... 

\ -rii jr+m-s j ■\s-2ni n ^ ^ ^ (4^) 



P(LY'^s\Lf^r)=Yi U"'d'-+"'-''(l-rf-z)''-2n. 0<s<2r 
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where I, the set of possible values for the number of insertions Ui, is given by 



I = {0, 1, . . . , [-J } for s <r, and {s ~ r, 



L2J} f°'^ * > 



Proof. Since X is a Markov process, {L^}j>i are independent with 

P{Lf ^r)=Y-\l-l): r^l,2, 



Since there is a one-to-one correspondence between the runs of X and the runs of Y', we can think of 
each LJ being obtained by passing a run of length through a discrete memoryless channel. For a pair 
{Lj^ = r, LJ = s), if the number of insertions is n^, the number of deletions is easily seen to be r + — s. 
Since there can be at most one insertion after each input bit, no more than half the bits in an output run 
can be insertions; hence the maximum value of is [|J. The minimum value of is zero for s < r, and 
s — r for s > r. Using these together with the fact that each bit can independently undergo an insertion with 
probability i, a deletion with probability d, or no change with probability 1 — d — i, the transition probability 

□ 



of the memoryless run- length channel is given by the second line of ( 48 1 . 



Lemma 6.5. limsup„_^^ iiJp(X"|S'*^"+\ F*^") = {1 - '^)Hp{L^\L^') where the jomt distribution of 
{L^,L^') is given 6t/ ([48|. 



Proof. The proof is identical to that of Lemma 5.2 



□ 



Theorem 4. The capacity of the deletion+insertion channel can be lower bounded as 



C{d,i,a)> max h{"/)~(q{l'-d)+qia)h 

0<7<1 



(7(1 — d) + qia 



-{l-d){A,+A,-^^^log, 0)-{l-j)HpiLf\Ll' 



where q, (3,9 , Ai, A2 are defined in Lemma 6.4 and Hp{Li\Li ) is computed using the joint distribution in 

Proof. The result is obtained by using Lemmas 6.1|6.5 in (43 ), and substituting the resulting bound in Q. □ 

The lower bound is plotted in Figure [5]for various values oi d — i, for a = 0.8 and for a = 1. For Theorem 
[ij we used the sequence T*^" to indicate the positions of complementary insertions together with S^" to 
indicate deleted runs. We can obtain another lower bound on the deletion-|-insertion capacity by using the 
sequence /*^" instead of T*^" , in Section |4.l[ This bound can be derived in a straightforward manner by 
combining the techniques of Sections |4.1| and [5] and is omitted. 



7 Conclusion 

The framework used in this paper suggests several directions for further progress on computing the capacity 
of channels with synchronization errors: 

• There are a few different ways to sharpen the bounds for the insertion channel and the deletion-|-insertion 
channel. One target is the inequality in (19) - creating the sequence Y^'^ ensures one-to-one correspon- 



dence between input and output runs, but is not an optimal way to use the positions of complementary 
insertions. Is there a better way to use the knowledge of (T*^", y*^")? 
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0.9 
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Deletion prob. d =lnsertion prob. i 



Figure 5: Lower bound on the deletion+insertion capacity C{d,i,a) for d = i. 



When the input distribution is Markov, the deletion channel ensures that the output distribution is also 
Markov. But the presence of insertions results in an output process that is not Markov, which is the 
reason an exact expression for the limiting behavior of JL^(^r™|y™) could not be obtained in Lemma 



4.6 A better bound for this term would improve the capacity lower bound. 



In the decomposition of i?(X"|r*^") in (l26| and the penalty terms - iJ(S'*^" F^^") for the 



deletion channel, and if (T^^" , S**^" |X" , y^^'> ) for the deletion+insertion channel - were dropped to obtain 
the capacity bounds. These terms are hard to compute precisely, but any non-trivial lower bound for 
these terms would improve the capacity bounds. 

Another direction is to investigate the performance of more general input distributions with i.i.d runs. 
For example, a distribution that is constant for small values and then decays geometrically may be a good 
run-length distribution for deletion channels since it decreases the probability of a run being completely 
deleted. A result on the structure of the optimal input distribution for small values of i and d (in the 
spirit of 10 TTj) would be very useful. Such a result could be combined with the approach used here to 



obtain good estimates of the capacity for small insertion and deletion probabilities. 

For the insertion channel, if we fix the insertion probability i and vary a, intuition suggests that the 
channel with a — 1 has the largest capacity since there is always one-to-one correspondence between 
input and output runs, and no auxiliary sequences are needed. The capacity lower bound plotted in 
Figure [2] seems to verify this intuition. Formally proving this conjecture would yield an upper bound on 



the insertion capacity C(i, a) for a < 1 since very tight bounds are known for the case of a = 1 12 



One could also obtain upper bounds on the insertion capacity by considering a genie-aided decoder with 
access to the auxiliary sequence T*^" , as done in |4] for the deletion channel. The task then is obtain an 
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upper bound on the capacity per unit cost of the equivalent DMC. 

• The framework used here can be extended to derive bounds for channels with substitution errors in 
addition to deletions and insertions. For this, we would need an additional auxiliary sequence, e.g., a 
sequence that indicates the positions of the bit flips. 

• The problem of synchronization also appears in file backup and file sharing |2][3] , where distributed nodes 
with different versions of the same file want to synchronize their versions. For example, consider two 
nodes, with the first node having source X and the second having source which is an edited version 
of X. The edits may include deletions, insertions, and substitutions. A basic question is: To update Y_ 
to X, what is the minimum communication rate needed from the first node to the second? This is a 
distributed source coding problem, and it can be shown that the optimal rate is given by the limiting 
behavior of ^^(^1^). The results derived in this paper provide bounds on this optimal rate for the case 
where X is Markov, and the edit model P(V\X.) is one with i.i.d deletions and insertions. Extension of 
these results to edit models with substitution errors would yield rates to benchmark the performance of 
practical file synchronization tools such as rsync [18| . 
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APPENDIX 

A Insertion Channel 
A.l Proof of Lemma 14.11 

We begin by noting that — (1 + almost surely, due to the strong law of large numbers. We have 



n 



E 



= E 
< E 

= E 



n 

1 p^jM„^Y^'-) 



^iogP(/^-^"|r*^").(i{^,(,^_,^,,^^^} + i{^^(,^_,,^,,^^)}) 



E 



log 

n 

1 p(^J'ri{l+i+e) -j^ri(l+i+e)'j 



-^iogP(/^^"|r*^'^).i{^^(,^,_^_,^,^^)} 



1 p(^jn{l+i+t) -j^n(l+i+c)^ 



E 



E 



1 p(^jri(l+i+<i) yn(l+i+e)^ 



■ l{^^(l+i-e,l+i+€)} 



E 



1 



iogP(/*^"|r*'^").i{^^(,^^_^_,^^^^3^ 



(49) 
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We first examine the third term in (49). The size of tlic support of — MogP(/*^"|y*^") is at most 2^", 
since is a binary sequence of length at most 2n. Hence, from Lemma 2.2 { — ^ logP(/*^"|y^^")}^^-i^ is 
uniformly integrable. From Lemma |2.1[ for any e > 0, there exists some 6 > 



E 



--iogP(/^^"|y^^")-i 



< e 



(50) 



whenever Pr ({^ ^ (1 + i - e, 1 + i + e)}) < (5. Since ^ almost surely, Pr ({^ ^ (1 + i - e, 1 + i + e)}) 

is less than 5 for all sufficiently large n. Thus (50) is true for all sufficiently large n. Similarly, the third term 
can be shown to be smaller than e for all sufficiently large n. Therefore, for all sufficiently large n, ( 49 ) becomes 

-ffp(J^"|F^") < ' ^ '-+e 

n n 

= ^—^ >- + li/p(r 1+*+^) ^ . I jn(l+.-e)^„(l+.-e)) ^ 

V I nfl + i — e) n n(l+i-e) + l' n(l + i-c) + ll ' ' 



< (1 + ^-6) ' U^e + e. 

n[l + t — e) 



(51) 



where (a) holds because I^^l'l'^.'^^} . -, and Y^}-}'^^'^^} . . can each take on at most 2^^^ different values. Hence 

^ ^ n{l+i — €)-\-l n[l-\-i — e)-\-l 



limsup -i?p(/*^"|r^^") < 5e + (1 + i + e) limsup — i7p(/™|r'^ 



Since e > is arbitrary, we let e — )■ to obtain 



limsup -iJp(/*^" I y*^") < (1 + i) limsup -i/p(r"|y™). 



(52) 



Using steps similar to (49), we have 
1 



-Hp{P""\Y 



E 



_1 p(lM„^Y^^'^) 



E 



--iogP(/^^"|y*^")-i 



> E 
= E 



1 p(/"(i+*^"5) e)^ 



E 



-^iogP(/^-^"|r^-^").i{M.^(i+._,,i+,+,)} 



E 



1 /'(/"(i+''^'^) yCi+'^is)^ 



E 



1 



iogP(/*'^"|r*'^")-i 



(53) 



Using arguments identical to the ones following ( 49 ) , one can show that the last two terms of ( 53 ) are smaller 
than e in absolute value for all sufficiently large n, leading to 



limsup -i/p(/*^" I y*^") > (1 + i) limsup -i/p(/'"|y"). 



Combining ( 52 ) and ( 54 ) completes the proof of the lemma. 



(54) 
□ 
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A. 2 Proof of Lemma [4.21 

We have 

-i/p(/"|y'") = -^ifp(/,|/-'-\y™) < -^i/p(/,|/,_i,r,,y,_i,r,_2) 

m m ^ — ' TO ^ — ' 

where the inequahty holds because conditioning cannot increase the entropy. Therefore 



(55) 



1 1 ™ 

hnisup-Fp(/™|y") < \imsup-Y,Hp{Ij\Ij-i,Yj,Yj^i,Y,^2) - Hm i/p(/, Y,-, y^-i, r,_2), (56) 



provided the hmit exists. We now show that Imij^ao Hp{Ij\Ij^i,Yj,Yj^i,Yj^2) exists and is given by (18). 

From Proposition 4.1 the process {I,Y} is characterized by a Markov chain with state at time j given 
by (/j, Yj, Yj-i). For any e > 0, the distribution P{Ij,Yj,Yj_i) is at most e (in total variation norm) from 
the stationary joint distribution tt given by (17) for all sufficiently large j. The conditional distribution 
P(/j , Yj_2, Yj-i) is given by (16). Due to the continuity of the entropy function, this implies 

lim i/p(/, Y,-, Y,_i, yj_2) = lim i7,(/,|/,_i, Y^, Y,_i, Y,_2). 



where tt refers to the stationary joint distribution on (Yj_2, Yj-i, /j, Y^), given by (17) and (16). 

HT^{Ij\Ij_i,Yj,Yj_i,Yj_2) can be computed as follows. First, we note that Ij = whenever Ij_i — 1. 
Therefore 



i?,(/,|/,_i, Y„ Y,_i, Y,_2) = J2 = 0' ^1 = ^^-1 = ^^-2 = y)H{Ij\I,-^ = 0, Y, = y, Y,_i = y, Y,_2 = v) 

y=o 



From (17) and (16), we have 





-1 = 


Q,Y, 


= y^Yj-i 




= y)H{lj\l,- 


i=0,Y,= 


= y^Yj^i = y, Yj_2 = y) 




-1 = 


0,Y, 


= y>^j-i 


= V^Y,^2 


= y)H{Ij\I,- 


i=0,Y,= 


= y-^Yj-i = y,Yj^2 = y) 




-1 = 


Q,Yj 




= y^ Yj_2 


^y)H{I,\I,_ 


1=0,Y,: 


= y-,Y j-i = y, Yj_2 = y) 
















(57) 


-2 = 


v) = 




_i=0,Y,_ 


-i = y^Yj_ 


-2 = y) [P{Y, 


y> = 


=0,Y,_i =y,Y,_ 










+P{Yj 


= y,ij = o\ij 


-i=0,Y, 


-1 = y^Yj^2 ^ y)\ 



2(1 + «) 



(ia + 17), 



and 



ff(/,| Vi - 0, Y, - y, Y,_i = y, Y,_2 = y) = 



+ ^7 



The remaining terms in (57 1 can be similarly calculated to obtain (18). 



(58) 

(59) 

□ 
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A. 3 Proof of Lemma 14.41 

We have 

i/p(/"(i+*)|r"(i+»),X") = Hp{Ij+i\P,Y''^^+'\X") 

(a) "(^+*) 

> E E P{I'-\lj ^0)PiY'-\{Yj,Y,+,,Y,+2,Y,+3) ^ iy,y,y,y), (X,, , = (y, y, - 0, /■''i) 

• Hp{I,+,\Y^-\p-\lj - 0, (r„ y,+3) = {y, y, y, y), (Xfc, , Xk^+,,Xk^+2) - (y, y, y)) 

(60) 

where fcj is the index of the input bit that corresponds to Yj. (kj is uniquely determined given Ii, . . . , Ij.) (a) 
is obtained as foUows. Since Ij+i — whenever Ij = 1, the entropy terms in the sum are non-zero only when 
Ij — 0. The inequality appears because we sum only over those indices j that satisfy 

Ij = 0, {Yj,Yj+i,Yj+2,Yj+3) = {y,y,y,y), {Xk^ , Xk^+i, = {y,y,y)- 

For such indices, the input bit Xk^ corresponds to Yj, and Xk-+2 to Yj+a, the uncertainty in Ij+i being 
whether X^^+i corresponds to or l}+2- We have 

P{{Yj,Yj+i,Yj+2,Yj+3) = {y,y,y,y), {Xk^, Xk^+,, Xk^+2) - {y,y,y)\Ij = 0) 

^P{{Xk^,Xk^+i,Xk^+2) = {y,y,y))-PiiYj,Y,+i,Y,+2,Y,+3) - (y, y, 2/)|(Xfc^ , X^^+i, = (y, y, y), /, - 0) 
= -7^ • {ia + (1 — i)ia) 

(61) 

where term ia corresponds to the case where Yj+i is a complementary insertion {Ij+i = 1, Ij+2 = 0), and the 
term (1 — i)ia to the case where K,+2 is a duplication {Ij+i = 0, Ij+2 = !)■ Therefore, 

ifp(/,+i|/^"-\/, =o,(y„r,+i,y,+2,^,+3) = {y,y,y,y), {Xk^,Xk^+^,Xk^+2) = {y,y,y)) = h ( . ) . 

(62) 

As explained in Section|4j {Ij}j>i, is a Markov chain with P(/j = 0) converging to as j 00. Substituting 



this along with (61 1 and (|62[) in (60), we obtain 



liminf liJp(/»(i+')|y»(i+^),X") > n(l + i) ■ -^f ■ (ia + (1 ~ i)ia)h ( — 

n->-oo n 1 + i \a + [1 — i)a 



□ 



A. 4 Proof of Lemma 14.61 

We have 

-i/p(T'"|r") = -E^p(r,|T^"\r'") < -E^p(r,|T,-i,y„y,_i). (63) 



m — ' m 
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Therefore 



iimsup-ijp(r"|y") < iimsup-^ijp(r,|r,_i,y,,y,_i) = lim i/p(T,|T,_i,y„y,_i), (64) 

m — ^oo m — ^oo Tfl ^ 3 ^oo 



3 = 1 



provided the hmit exists. We now show that limj^ao Hp{Tj\Tj^i,Yj ,Yj^i) exists and is given by (22). 

Note that Tj = whenever Tj^i = 1 since we cannot have two consecutive insertions. Also, Tj — 
whenever Yj — i^-i since Tj = 1 only when Yj is a complementary insertion. Thus we have for all j > 2: 



i7(T,|r,_i,y„y,„i) =p{t,^i = o,y, = i,y,_i = q)h{t,\t,^, = o,y, = = o) 
+ p(T,_i = 0, Yj - 0, - i)i/(T, |r,_i - 0, Yj = 0, - i) 



(65) 



Note that for all j > 1, P{Tj = 1) = P{Ij = l)a, where Ij = 1 if Yj is an inserted bit, and Ij = otherwise. 
Therefore, 

P{Tj ^0) = 1- P{Ij ^l)a ,j>l. (66) 
Note that the binary-valued process {Ij}j>i is a Markov chain with transition probabilities 

Pr(/j = = 0) = 1 - Pr(/j = 0\lj =0)=i, Pr(/j = = 1) = 1 - Pr(/j = 0|/j = 1) = 0. (67) 

For i e (0, 1), this is an irreducible, aperiodic Markov chain. Hence a unique stationary distribution tt exists, 
which is given by 



nilj = 1) = 1 - n{Ij = 0) 



1 + i 



Hence for any e > 0, 



P{Ij - 1) 



1 + i 



< e and 



Pr(/, = 0) 



1 + i 



< e 



(68) 



(69) 



for all sufficiently large j. Using this in (66), for all sufficiently large j, the distribution P(Tj) is within total 
variation norm e of the following stationary distribution. 

1 + ia 



tt(T, =0=1 = — , TT T, = 1 = . 



(70) 



Further, we have P(Yj = 1|T, = 0) = P(Yj = y\Tj = 0) = 0.5, for y e {0, 1} since both the input distribution 
and the insertion process are symmetric in and 1. Hence the stationary distribution for (Tj_i,lj_i) is 



7r(T,-_i=0,r,_i=2/) 



1 + ia 
2{l + t)'' 



la 



2{l + i)' 



(71) 
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Next, we determine the conditional distribution P{Yj,Tj\Yj^i = y,Tj^i — 0) for y e {0, 1}. We have 



P(T, = Q,Y, = y|yj_i = j/,T,_i = 0) 

= PiT, - 0, Y, = y, = l|y,_i = y, T,_i = 0) + P{T, = 0, Y, = y, = 0|y,_i = y, T,_i = 0) 
= = - 0) • P{T, = 0,y, = - l,T,_i - o,y,_i = y) 

+ P(/,_i = 0|T,_i = 0) • P(T, = 0, = = 0, = 0, r,_i = y) (72) 

P(lj-l - Uj - Uj 

1 - aP(/,_i = 1) ^ ^ 1 - aP(/,_i = 1) + 



In the above, (6) is obtained using (66). (a) is obtained as follows. The event (/j-i — l,Tj_i = 0,y,_i = y) 
implies Yj^i is a duplication, and hence 5^-2 = y corresponds to an input bit (say Xa), and Yj is the next input 
bit Xa+i. The probability that Xa+i = Xa is 7. Hence P(T, = 0, = = l,Tj-i = 0,yj_i = y) = 7. 

When {Ij-i = 0,Tj_i = 0,yj_i = y), y,-! corresponds to an input bit, say Xi,. Conditioned on this, the 
event (Tj = 0,Yj — y) can occur in two ways: 

• Yj is the next input bit Xf,-|_i and is equal to y. This event has probability (1 — 1)7. 

• y, is a duplication of Yj^i. This event has probability ia. 

Hence P(rj — 0,Yj ~ y\Ij-i ~ 0, P7-1 = 0, y^-i = y) = ((1 — i)7 + ict)- Similarly, we calculate 

P(P, = 0, y, = y|y,_i = y, P,_i = 0) = P(/,_i = l|T,_i = 0)P(T, = 0, Y, - = 1, P,_i = 0, y,_i = y) 

+ P(/,_i = 0|T,_i = 0)P(T, = 0, y, = y|/j_i = 0, P,_i = 0, y,_i = y) 

^ p(/,-i = i)P(T,_i = o|/,_i = i) ^ p(/,_i = o)p(r,-_i = o|/,_i = 0) _ 
p(r,_i = o) ^ p(T,_i = o) ^ 

" ^1577 ^(l-*)7, 



1 - aP{Ij_i = 1) ' 1 - aP{Ij_i = 1) 

(73) 

P(P, = l,y, = y|y,_i = y,P,_i = 0) = P(Vi = = 0)P(T, = l,y, = y|/,_i = l,P,_i = o,y,_i = y) 

+ P(Vi = 0|T,_i - 0)P(T, = 1, y, = y|/,_i = 0, P,_i = 0, y,_i - y) 
_ P(/,_i^l)P(P,_i^O|/,_i^l) ^ P(/,-_i=0)P(P,_i^O|/,_i^O) .^ 
P{Tj-i = 0) P(T,_i = 0) 

P(/,_i = 0) 



l-aP(Vi = l)'"' 

(74) 

and 

P(T, = 1, y, = y|y,_i = y, P,_i = 0) = 0. (75) 



Using (69) in equations (72 )-([75|, we see that for all sufficiently large j, the distribution P(rj,y,|y,_i = 
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— 0) is within a total variation norm e from the foUowing stationary distribution 



AT, 


= Q,Y, 


= y\yj-i 


= y= Tj^i 


= 0) 


AT, 


= 0,Y, 


= m-i 


= T,-i 


= 0) 


AT, 


= l,Y, 


= y\Y,-i 


= Tj-i 


= 0) 


AT, 


= l,Y, 


= y\yj-i 


= yi T,-i 


= 0) 



ia{l + 7) + (1 — 1)7 
1 + ia ' 
7(1 — ia) 

I + ia ' 
ia 

1 + ia ' 



(76) 



Due to the continuity of the entropy function in the joint distribution, we therefore have 



hm i?p(T,|T;_i,y,_i,r,) = i7,(r,-|T,_i,y,_i,r,), 



where the joint distribution 'K{Tj_i,Yj_i,Yj,Tj) is given by (71| and (76). Using this in (65 1, one can compute 
iJ^(Tj|Tj_i, Ij^i, Yj) to obtain the result in the lemma. □ 



A. 5 Proof of Lemma 14.71 

Due to ( [23| , it is enough to show that ^Hp{Lf , . . . , |Lf , . . . , L^^) converges to {l — j)Hp{L^\LY)- Since 
{{Li ,LY), {L2 tL^), . . .} is an i.i.d process, from the strong law of large numbers, we have 



lim \ogVT{Lf,...,Ll\Ll,...,Ll)^Hp{Lf\Ll) a.s. 

rrn-oo m 



(77) 



Further, we have the normalized number of input runs ^ — > (1 — 7) almost surely. Using the above in Lemma 
\2A[ we obtain 

1- "'-X tX itY tY \ _ tt I tX\tY\ 



hm — logPr(Lf ,...,L^Ji[,...,L],J = i/p(Lf |L[) a.s. 



(78) 



We now argue that — ^ logPr(L;f- , . . . , L^J\L\ , . . . , L^^) is uniformly integrable. Supp(L^, . . . , L^^ |if , . . . , L]^^) 
can be upper bounded by 2" since the random sequence {L^ , ■ . . , Lp^ ) is equivalent to X", which can take 
on at most 2" values. Hence, from Lemma 



2.2 



^ log Pr(L^ , . . . , Lp^^ \L\ , . . . , Lp^ ) is uniformly integrable. 
Using this together with (|78[) in Lemma |2.3[ we conclude that 



lim -i/p(Lf ,...,L^JL[,...,L]^J= lim E 

n— J-oo 71 n— foo 



1 



logPr(Lf,...,L^^jL^ 



X I tY 



lim --logPr(Lf,...,L^jLr, 

n— >cx3 fi 



E 

n- 

Hp{Lf\LX). 



-4j 



(79) 



□ 
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A. 6 Proof of Lemma 14.91 

We have 

n(l+i) 

n(l+i) 

i=i ye{o,i} 

■ Hp{T,+,\Y^-\T^-\i, ^0,{Y„Y,+i,Y,+2,Y,+3) = {y,y,y,y), {Xk,,Xk,+uXk,+2) = 

(80) 

(a) is obtained as follows. Since T^+i = whenever Ij — 1, the entropy terms in the second line are non-zero 
only when Ij = 0. The inequality appears because we only sum over indices j such that 

Ij = 0, {Yj,Yj+i,Yj+2,Yj+3) = {y,y,y,y), {Xk^ , Xk^+i, Xk^+2) = {y,y,y), 

where kj is the index of the input bit that corresponds to Yj. This is uniquely determined because given 
, there is a one-to-one correspondence between the input and output runs until bit Y^ . Then Yj ~ y and 
Yj+i — y implies kj is the last input bit in the run containing Yj. Therefore, input bit X^. corresponds to Yj, 
and Xk-+2 corresponds to 5^+3, the uncertainty being whether X^.^i corresponds to l^+i or 1^+2- We have 

P((r„y,+i,y,+2,y,+3) = {y,y,y,y), {Xu^,Xu^+^,Xt,^+2) = {y,y,y)\l, = 0) 

= P{{Xk^,Xk^+uXk,+2) = (y,y,2;)|/j =0) •P((r„r,+i,r,+2,r,+3) = (y,y,y,y)|(Xfe^.,Xfe^.+i,Xfc^.+2) = (y,y,y),/, = 
= -7^ • [ia + (1 — 1)10;) 

(81) 

where term ia corresponds to the case where i^+i is a complementary insertion (Tj+i — 1), and the term 
(1 — i)ia to the case where 5^+2 is a duplication (7}+i = 0). Consequently, 



Hp(T,+i|r^-i,/, = o,(r„r,+i,r,+2,r,+3) = (y,y,y,y), (x,^.,x,,.+i,x,^+2) = (y,y,y)) = h 



za + (1 — i)ia 
(82) 



Substituting (811 and (82) in (801 and using the fact that P{Ij = 0) ^ we obtain 

1. .pi-.--.- /^-n /'I I ^-h) /'I -Li~l -r^-HX . . .\ 1 

nmmi — 

n— )-oo Tl 



i/p(T"(i+*)|y"(i+^), X") > n(l + z) • — f • (la + (1 - t)ia)h _ ^ .'^ .. 

1 + t \a + (l — t) 



□ 
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B Deletion Channel 



B.l Proof of Lemma 15.11 

We first show that almost surely 



hm -- log P{S^'-\Y^'-) = (1 - d)Hp{S2\Y^Y2). 

n—^OQ Ti 



From Propositions 



5.1 



and 



5.2 {i^m}m>i and {{Sm, i^m)}m>i are both ergodic Markov chains with stationary 



transition probabilities. Therefore, from the Shannon-McMillan-Breiman theorem 19], we have 



lim logP(r™) - Hp{Y2\Yi) a.s., 

m—>oo m 

lim --logP(5",r™) =i/p(52, Fain) a.s. 

m— i-oo 771 



(83) 
(84) 



Subtracting (84) from (83), we get 



lim logP(5"|y") = Hp{S2\Y2Yi) a.s. 

m->oo m 



Further, we have lim„_^oo = 1 — d almost surely. Using this with (85) in Lemma 



2.4 



lim logP(5*^"|r^^") ^ (1 - d)Hp{S2\YiY2) a.s. 



(85) 

we conclude that 

(86) 



We now argue that — MogP(S'*^"|F^^") is uniformly integrable. The Supp(S'*^" can be upper 

bounded by representing 5*^" as 



XX .J . X Y XX — ^3: Y X a; . . .Y xx -j ^^ Y xx . a: 

5"! 52 + 1 

where the Y^s represent the bits of the sequence y*^" , and each x represents a missing run. Since the 
maximum length of the above binary sequence is n, we have Supp(5'^^" < 2". Hence, from Lemma 

— MogPr(5*^"|F^^") is uniformly integrable. 



2.2 



Using this together with (86) in Lemma 2.3 we conclude that 



lim -Hp{S^-^^\Y^''") = lim E 



Thus we have 



— iogP(5*^"|r*^") 

n 



E 



lim --logP(5^'^"|y*'^" 



= {l-d)Hp{S2\YiY2). 



lim ^Hp{S^'-+^\Y^'-) = lim -Hp{S^"-\Y^''^) + lim -Hp{Sm„+i\Y^'\ S^'") = {1 - d)Hp{S2\YiY2) +0. 



n-^oo ft 



n— ^oo Ji 



n— >-oo Ji 



□ 
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B.2 Proof of Lemma 15.21 

Due to (35), it is enough to show that ^Hp{Lf^ , . . . , L^^\LY , ■ ■ ■ , L]^^) converges to {1 — j)Hp{L^\LY ] 



Since {(Lf , if ), (£3,^2 )'■•■} is an i.i.d process, from the strong law of large numbers, we have 

hm ^-PiiL^,...,LfjLT,...,Li;)^Hp{L^\LT) a.s. 



(87) 



Further, we have the normalized number of input runs ^ — > (1 — 7) almost surely. Using the above in Lemma 
12.41 we obtain 



1_ 



hm ^-PT(Lt,...,L^jLl',...,L'j,'J=Hp{Lt\Ll ) a.s. 

We now argue that — ^ logPr(L^, . . . , L^J\LX , . . . , i]^^J is uniformly integrable. Supp(i;f , . . . , L-^^ |Lf , . . . , L^^) 
can be upper bounded by 2" since since the random sequence (i^, . . . , ) is equivalent to X", which can take 



on at most 2" values. Hence, from Lemma 2.2 — ^ logPr(_L^, . . . , \L\ , . . . , ) is uniformly integrable 



Using this together with (88) in Lemma 2.3 we conclude that 



hm ^Hp[L'^,...,Ll^\LX', 

n^oo ft 



,L^J = lim E 



-^logPr(Lf,...,Lfjir, 



E 



1 



= Hp{Lf\Ll 



lim --logPr(Lf ,...,Lfjir 

n— >-oo fl 
X\ tY'\ 



(89) 



□ 



C Deletion+Insertion Channel 

C.l Proof of Lemma 16.41 

We have 

-ff(S'™|T'",y'") = V -H(S,\S^-\T"',Y"') < V -H(S,\Yj^i,Y,,TA 
m ^-^ m m 

We will show that limj_i.oo H{Sj\Yj^i, Yj, Tj) exists and obtain an analytical expression for it. For all j, 

HiS,\Y,,^,Y„T,) = P{Y,^,,Y,,T, = 0)H{S,\Y,^,.Y,.T, = 0) = 
^ P(r,_i = Y, = y, T, = 0)H{S, = Y, = y, T, = 0) + P(r,_i = y, Y, = y, T, = 0)H{Sj\Y,., = y, Y, = y, T, = 0) 

y6{0,l} 

(90) 
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The first equahty above holds since Tj — 1 implies Yj is an inserted bit, and so no deleted runs occur between 
Yj_i and Yj. P{Yj^i,Yj,Tj ~ 0) can be computed as follows. 

P(r,_i = y, Yj - y, T, - 0) - = 0, = y, T, - 0, Y^ ^ y) + P{Ij_, = 1, - 0, - y, = 0, 

+ P(/,_i = 1, T,_i = 1, = T, = 0, = y) 

= = mPilj = 0| Vi = 0)q + P{Ij = = 0)a) + = l)aq + \p{Ij-i = 1)^(1 - q) 



^ 2 



1 



((1 - l')q + I'a) + y—jaq + — 7«(1 - q) 
1 + l' 1 + l' 1 + i' 



(91) 



The last two terms in (a) are obtained by noting that Ij-i = 1 implies Ij-i is an insertion and hence Tj = 0. 
In this case, Yj_2 — y corresponds to the last non-inserted bit before Yj. The last line is due to the fact 
that {Ij}j>i converges is a Markov chain that converges to the stationary distribution P{Ij = 1) = jzf[r, 
P{Ij — 0) = YW' ^^^^ f*^^ sufficiently large j, PiYj^i — y,Yj = y,Tj — 0) is at most e in total variation 
norm from the stationary distribution 



TriYj^,=y,Yj=y,Tj^O) 



((1 - i')q + i'a) + j^aq + ^^a(l - q) 
1 + 1' 1 + i' 1 + i' 



, ye {0,1}. 



Similarly, P(y,_i = y,Yj = y,Tj = 0) converges to 

^(r,_i = y, Y, = y, T, = 0) = 7r(/,_i = 0, - y, T, = 0, = y) + F( = 1, r,_i = 0, = y, T, = 0, Y, 

+ 7r(/,_i = 1, r,_i = 1, = y, Tj = 0, Yj = y) 

= ^7r(/,_i = 0)P{Ij = 0|/,_i = 0)(1 -q) + \P{I,~1 = -q) + ^Alj-i = I)"? 



(92) 



We next determine the joint distributions Tr{Sj, i^-i — y,Yj = y, Tj = 0) and Tr{Sj, i^-i = y,Yj — y, Tj — 
0) to compute H^{Sj\Yj^i ^ y, Yj = y, Tj ^ 0) and H^{Sj\Yj^i = y, Yj = y, Tj = 0) in Q. For /c = 0, 1, . . ., 
we have 

n{Sj=k,Yj^^=y,Yj=y,Tj=Q) = 

7r(/,_i = 0, = = 0, Yj = y, 5, = k) + 7r(/,_i = 1, T,_i = 0, y^-i = y, Tj=o, Yj = y, 5, = k) (93) 
+ 7r(/j_i = l,T,-i = l,y,-i = y,Tj^o,Yj = y,^^ = fc). 

The first term corresponds to ij-i being an original input bit, the second term to Xj-i being a duplication, 
and the third to Yj_i being a complementary insertion, respectively. Each of these terms can be calculated in 
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a manner very similar to equations (29) and (30) of Proposition 5.2 Hence we obtain, 



t:{Sj = k, Y,_i = y, Yj = y, Tj = 0) = 



2(1+1' 

l^i's (l-7)(l-rf) ( d(l-7) 
2(l+-i') (l-7d)2 \^ l-7d 

»'g (l-7)(l-d) ( d(l-7) ' 
2(l+i') (l-7d)2 I 1 — I I 



^^A 7(l-rf) , .V;., (l-7)(1- 
(l-7d)2 



fc = 
fc = 1,3, . 



fc = 2,4,. 



(94) 



Similarly, we also determine 



Tr{Sj = A:, = y, y,- = Tj = 0) 



1 


(1 - I'a) 


2(l+i') 


i' a i 


l-7)(l-d) 


2(l+i') 


(l-7<i)2 


1-i'a (l-7)(l-<i) 


2(l+i') 


(l-7<i)2 



(l-7d)2 



1-yd J ' 



fc = 
fc = 1,3, 
fc = 2,4, 



(95) 



From (91) and (94), we can compute 



7r(y,_i = y, Y, = = 0)i7.(5,|y,_i = y, = y, T, = 0) 



2(1+2') 



(i'a + (1 - i'a)^^^^ — ^ + 2'a/3) log 
1 — 70! 



i'a + (1 — i'a)q + i'aq 



+ i'al3 



20^ 



(1-02) 



2 l0g2 



+(l-i'a)/3 



,-.Jil + ), fl\ 6'/3(l-i'a) (1 - i'a)g + i'ag\ O^jSi'a f i'a + {I - i'a)q + i'aq 



(1-02) 



2 l0g2 ( n 



1 - 



l0g2 



I3{l~i'a) 



T3^i°g2 



pi' a 
(96) 



where and /3 are defined in the statement of the lemma. Similarly, from (92 1 and (95), one can compute 



7r(r,_i = y, Yj = y, T, - Q)H^{Sj\Y,^i = y, Y, = y, T, = 0) 



2(1+2') 



(i'a'^ — ^ + (1 - i'a)p) log2 



{l-i'a)q + i'aq \ .,_J{l + e'^) 



1 -7d 



+ (1 -i'a)/3^ 



(1-02) 



2 l0g2 



2^2 / Y 

+ (l-*'")/3^Y3^l0g2 



;9(l-i'a) /(I - i'a)g + i'ag\ 0/3i'a f {I - i'a)q + i'aq 



1-02 



log2 



/3(l-i'a) 



Pi'a 



Substituting ( 96 ) and ( 97 ) in ( 90 ) completes the proof of the lemma. 



(97) 

□ 
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